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Abstract 



In this paper, we propose a method for the approximation of the solution of high-dimensional 
weakly coercive problems formulated in tensor spaces using low-rank approximation formats. 
I I The method can be seen as a perturbation of a minimal residual method with residual norm 

■^r corresponding to the error in a specified solution norm. We introduce and analyze an iterative 

*-^ algorithm that is able to provide a controlled approximation of the optimal approximation of 

^~i the solution in a given low-rank subset, without any a priori information on this solution. We 

^S^ also introduce a weak greedy algorithm which uses this perturbed minimal residual method 

^ for the computation of successive greedy corrections in small tensor subsets. We prove its 

a convergence under some conditions on the parameters of the algorithm. The residual norm can 

, , be designed such that the resulting low-rank approximations are quasi-optimal with respect 

to particular norms of interest, thus yielding to goal-oriented order reduction strategies for 
T-H the approximation of high-dimensional problems. The proposed numerical method is applied 

^ to the solution of a stochastic partial differential equation which is discretized using standard 

^>0 Galerkin methods in tensor product spaces. 

(N 
• Introduction 

j«^ Low-rank tensor approximation methods are receiving a growing attention in computational sci- 

^_H ence for the numerical solution of high-dimensional problems formulated in tensor spaces (see 

L| the recent surveys [26, 8, 25, 21] and monograph [22]). Typical problems include the solution 

of high-dimensional partial differential equations arising in stochastic calculus, or the solution of 

stochastic or parametric partial differential equations using functional approaches, where functions 

of multiple (random) parameters have to be approximated. These problems take the general form 



A{u) = b, ueX = Xi(g}...(g}Xd, (1) 

where A is an operator defined on the tensor space X. Low-rank tensor methods then consist in 
searching an approximation of the solution u in a low-dimensional subset Sx of elements of X of 
the form 

Y.---J2 "n...*.< . . . <, < e x^, (2) 

il id 

where the set of coefficients {ai-^,,,i^) possesses some specific structure. Classical low-rank ten- 
sor subsets include canonical tensors. Tucker tensors. Tensor Train tensors ]36] or more general 
tree-based Hierarchical Tucker tensors [23, 17[. These subsets possess nice approximation power 
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in the sense that they are able to approximate a large class of tensors with a reasonable preci- 
sion. Low-rank approximation methods are closely related to a priori model reduction methods in 
that they provide approximate representations of the solution on low-dimensional reduced bases 
{wl (E) ■ ■ ■ <E) wf } that are not selected a priori. 

The best approximation of ?i G X in a given low-rank tensor subset Sx with respect to a 
particular norm || • j|j!f in X is the solution of 

min ||?i — f ||j!f . (3) 

veSx 

Low-rank tensor subsets are not linear subspaces nor convex sets. However, they usually satisfy 
topological properties that give sense to the above best approximation problem and allows the 
application of standard optimization algorithms [37, 13, 40]. Of course, in the context of the 
solution of high-dimensional problems, the solution u of problem (1) is not available, and the 
best approximation problem (3) can not be solved directly. Tensor approximation methods then 
typically rely on the definition of approximations based on the residual of equation (1), which 
is a computable quantity. Different strategies have been proposed for the construction of low- 
rank approximations of the solution of equations in tensor format. The first family of methods 
consists in using classical iterative algorithms for linear or nonlinear systems of equations with 
low-rank tensor algebra (using low-rank tensor compression) for standard algebraic operations 
[27, 24, 30, 3[. The second family of methods consists in directly computing an approximation of 
u in Sx by minimizing some residual norm [4, 31, 11[: 

min \\Av — 6||^. (4) 

v€Sx 

In the context of approximation, where one is interested in obtaining an approximation with a 
given precision rather than obtaining the best low-rank approximation, constructive greedy algo- 
rithms have been proposed that consist in computing successive corrections in a small low-rank 
tensor subset, typically the set of rank-one tensors [28, 1, 31[. These greedy algorithms have been 
analyzed in several papers [2, 5, 14, 16, 6, 18[ and a series of improved algorithms have been 
introduced in order to increase the quality of suboptimal greedy constructions [33, 34, 29, 16, 19[. 

Although minimal residual based approaches are well founded, they generally provide low-rank 
approximations that can be very far from optimal approximations with respect to the natural norm 
II • II X, at least when using usual measures of the residual. If we are interested in obtaining an 
optimal approximation with respect to the norm || • ||x, e.g. taking into account some particular 
quantity of interest, an ideal approach would be to define the residual norm such that 

\\Av - b\\i, = \\u - v\\x, 

where || • ||x is the desired solution norm. Minimizing the residual norm would therefore be equiv- 
alent to solving the best approximation problem (3). However, the computation of such a residual 
norm is in general equivalent to the solution of the initial problem (1). 

In this paper, we propose a method for the approximation of the ideal approach. This method 
applies to a general class of weakly coercive problems. It relies on the use of approximations 
rs{v) of the residual r{v) = Av — b such that ||r5(u)||j, approximates the ideal residual norm 
||r(w)||i, ~ \\u — v\\x- The resulting method allows for the goal-oriented construction of low-rank 
tensor approximations which are quasi-optimal with respect to a norm || • \\x that can be designed 
according to some quantity of interest. We first introduce and analyze an algorithm for minimizing 
the approximate residual norm ||r5(f)||i, in a given subset Sx- This algorithm can be seen as an 
extension of the algorithms introduced in [7, 9[ to the context of nonlinear approximation in subsets 
Sx ■ It consists in a perturbation of a gradient algorithm for minimizing in Sx the ideal residual 
norm ||r(u)||j,, using controlled approximations rs(v) of the residual r(v) such that 



{l-6)\\u-v\\x<\\rs{v)\U<il + 6)\\u-v\\ 



X, 
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for some prescribed precision S. Also, we propose a weak greedy algorithm for the adaptive con- 
struction of an approximation of the solution of problem (1), using the perturbed ideal minimal 
residual approach for the computation of greedy corrections. A convergence proof is provided 
under some conditions on the parameters of the algorithm. 

The outline of the paper is as follows. In section 1, we introduce a functional framework for 
weakly coercive problems. In section 2, we briefly recall some definitions and basic properties of 
tensor spaces and low-rank tensor subsets. In section 3, we present a natural minimal residual 
based method for the approximation in a nonlinear subset Sx , and we analyze a simple gradient al- 
gorithm in Sx ■ We discuss the conditioning issues that restrict the applicability of such algorithms 
when usual residual norms are used, and the interest of using an ideal measure of the residual. In 
section 4, we introduce the perturbed ideal minimal residual approach. A gradient-type algorithm 
is introduced and analyzed and we prove the convergence of this algorithm towards a neighbor- 
hood of the best approximation in Sx- Practical computational aspects are detailed in section 
5. In section 6, we analyze a weak greedy algorithm using the perturbed ideal minimal residual 
method for the computation of greedy corrections. In section 7, a detailed numerical example will 
illustrate the proposed method. The example is a stochastic reaction-advection-diffusion problem 
which is discretized using Galerkin stochastic methods. In particular, this example will illustrate 
the possibility to introduce norms that are adapted to some quantities of interest and the ability 
of the method to provide goal-oriented (quasi-)best low-rank approximations. 

1 Functional framework for weakly coercive problems 

1.1 Notations 

For a given Hilbert space H, we denote by (•, ■)h the inner product in H and by \\-\\h the associated 
norm. We denote by H' the topological dual of H and by {■,-)h',h the duality pairing between 
H and H'. For v E H and ip E H' , we denote (p{v) — {(f, v)h',h- We denote by Rh : H ^^ H' the 
Riesz isomorphism defined by 

{v,w)h = {v,Rhw)hm' = {Rhv,w)h',h = {Rhv,Rhw)h' Vw,w; € H. 

1.2 Weakly coercive problems 

We denote by X (resp. Y) a Hilbert space equipped with inner product (•, ■)x (resp. (•, •)y) and 
associated norm || • \\x (resp. || • ||y). Let a : X xY ^^M.he a. bilinear form and let 6 e F' be a 
continuous linear form on Y . We consider the variational problem: find u E X such that 

a{u, v) = b{v) \Jv e Y. (5) 

We assume that a is continuous and weakly coercive, that means that there exist constants a and 
/3 such that 

a{v,w) . 

sup sup ^ 13 < +00, (6) 

veXweY \\v\\x\\w\\y 

a(v,w) 

mf sup = a > 0, (7) 

vexwev \\v\\x\\w\\y 

and 

sup ^^^ > \fw ^ in Y. (8) 

vex \\v\\x 

We introduce the linear continuous operator A : X ^ Y' such that for all {v, w) E X x Y, 

a(w,w) = {Av,w)y',y- 
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We denote by A* : F ^ X' the adjoint of A, defined by 

{Av,w)y',y = {v, A*w)x,x' y{v,w) Cz X X Y. 
Problem (5) is therefore equivalent to find u Cz X such that 

Au = b. (9) 

Properties (6), (7) and (8) imply that A is a norm-isomorphism from X to Y' such that for all 

veX, 

a\\v\\x<\\Av\\Y'<P\\v\\x (10) 

ensuring the well-posedness of problem (9)[12]. The norms of A and its inverse A^^ are such that 

\\A\\x^Y'= sup ^^=/3, 
o^vex \\v\\x 

and 

\\A^\\y'^x= sup II i'"' = inf ^^ =a\ 

o=tyeY' WvWy' \o^vex \\v\\x J 

The condition number of operator A is 

k{A) = \\A\\x^y'\\A-^\\y'^x^->1. 

a 

2 Approximation in low-rank tensor subsets 

2.1 Hilbert tensor spaces 

We here briefiy recall basic definitions on Hilbert tensor spaces (see [22]). We consider Hilbert 
spaces X^, ^ ^ fJ' ^ d, equipped with norms || • \\x and associated inner products (•,-)^^. We 
denote by <E)f^^iV^ ~ v^ (E) . . . (E) v"^, v^ € X^, an elementary tensor. We then define the algebraic 
tensor product space as the linear span of elementary tensors: 

d 

a^X^^ span{®;^^ii;^ : w^ G X^, 1 < ^ < d}. 

A Hilbert tensor space X equipped with the norm || • ||x is then obtained by the completion with 
respect to || • ||x of the algebraic tensor space, i.e. 



d " " d 

X = a^X, =||.||,(g)X^. 

Note that for finite dimensional tensor spaces, the resulting space X is independent of the choice 
of norm and coincides with the normed algebraic tensor space. 

A natural inner product on X is induced by inner products (•, •);^ in X^, 1 < ^i < d. It is 
defined for v — ®'t^iV^^ and w — ®'t=i'w^ by 



^=1 



^n. 



and extended by linearity on the whole algebraic tensor space. This inner product is called the 
induced (or canonical) inner product and the associated norm the induced (or canonical) norm. 



^e.g. Xfji, = R"** equipped with the Euclidian norm, or X^^ = 11^(0,^), k > 0, a Sobolev space of functions 
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2.2 Classical low-rank tensor subsets 

Low-rank tensor subsets Sx of a tensor space X = ||.|| ^ i X^^ are subsets of the algebraic tensor 
space a ^u=i -^t^j that means that elements v G Sx can be written under the form 



E "^.■•.^.(Sx.' (11) 



E 



where a — (ai)ig/ € M^, with / := /i x . . . x I^, is a set of real coefficients that possibly satisfy 



some constraints, and (w" )i^ei^ G (-'^m)^, for 1 < ^ < d, is a set of vectors that also possibly 
satisfy some constraints (e.g. orthogonality). 

Basic low-rank tensor subsets are the set of rank-r canonical tensors: 



and the set of Tucker tensors with rank r = (ri, . . . , r^): 

( ri re 

TAX) = ^ - E • • • E "-•■■- <=i < ^ < ^ ^^ 



A" 



«d = 



Other low-rank tensor subsets have been recently introduced, such as Tensor Train tensors [36] or 
more general tree-based Hierarchical Tucker tensors [23, 17[, these tensor subsets corresponding 
to a form (11) with a particular structure of tensor a. Note that for the case d = 2, all the above 
tensor subsets coincide. 

Remark 2.1. From a numerical point of view, the approximate solution of the variational problem 
(5) requires an additional discretization which consists in introducing an approximation space 
X = <E)'z^iXf^, where the X^ C X^ are finite dimensional approximation spaces (e.g. finite element 
spaces). Then, approximations are searched in low-rank tensor subsets Sx of X (e.g. TZr{X) or 
%.{X)), thus introducing two levels of discretizations. In the following, we adopt a general point 
of view where X can either denote an infinite dimensional space, an approximation space obtained 
after the discretization of the variational problem, or even finite dimensional Euclidian spaces for 
problems written in an algebraic form. 

2.3 Best approximation in tensor subsets 

Low-rank tensor approximation methods consist in computing an approximation of a tensor u ^ X 
in a suitable low-rank subset Sx of X. The best approximation of u in Sx is defined by 

min ||?i — v\\x. (12) 

v£Sx 

Classical tensor subsets are not linear subspaces nor convex sets. However, they usually satisfy 
properties that give sense to the above best approximation problem. We now consider that Sx 
satisfies the following properties: 

Sx is weakly closed (or simply closed in finite dimension), (13) 

Sx C -iSx for all 7 e M. (14) 

Property (14) is satisfied by all the classical tensor subsets mentioned above (canonical tensors. 
Tucker and tree-based Hierarchical Tucker tensors). Property (13) ensures the existence of so- 
lutions to the best approximation problem (12). This property, under some suitable conditions 
on the norm || • \\x (which is naturally satisfied in finite dimension), is verified by most tensor 
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subsets used for approximation (e.g. the set of rank-r canonical tensors for d = 2, the set of ele- 
mentary tensors TZi for d > 2 [14], the sets of Tucker or tree-based Hierarchical Tucker tensors [15]). 

We then introduce the set-valued map II^^ : X -^ 2^^ that associates to an element u E X 
the set of best approximations of u in Sx '■ 

n^x (") — ^I'g ™ii^ II" ^ ^\\x- (15) 

veSx 

Note that if Sx were a closed linear subspace or a closed convex set of X, then II^^ (u) would 
be a singleton and Tlsx would coincide with the classical definition of the metric projection on 
Sx- Property (14) still implies the following property of projections: for all u € X and for all 

h - Mix = \Mx - \\M\x with \\w\\x ^ o-{v;Sx) = max —^ — . (16) 

zeSx \\z\\x 

Ilg^{v) is therefore a subset of the sphere of radius a{v;Sx) in X. In the following, we will use 
the following abuse of notation: for a subset S d X and for w Cz X , we define 

\\S — w\\x ■— sup \\v — w\\x 
ves 

With this convention, we have ||n5x(f)||j!f = a{v]Sx) and 

msxiv)-v\\x = Mx-msAv)\\x- (17) 

3 Minimal residual based approximation 

We now consider that problem (9) is formulated in tensor Hilbert spaces X = \\.\\^ 'S)n=i ^m ^'^^ 

Y = jj.ji^ (2)u=i ^A*- ^^^ ^™ ^^ '^^'^^ ^° ^^'^ ^^ approximation of the solution u of problem (9) in 
a given tensor subset Sx C X. 

3.1 Best approximation with respect to residual norms 

Since the solution u of problem (9) is not available, the best approximation problem (12) can not 
be solved directly. However, tensor approximations can be defined using the residual of equation 
(9), which is a computable information. An approximation of u in Sx is then defined by the 
minimization of a residual norm: 

min \\Av — 6||y/ = min \\A(v — u)\\y'- (18) 

v£Sx v&Sx 



Assuming that we can define a tangent space Ty{Sx) to Sx at w G Sx, the stationarity condition 
of functional J : v i-^ ll^l"*^ ^ ")lly' at u G Sx is 

{J'{v),Sv)x'.x^O ysveniSx), 

or equivalently, noting that the gradient of J at u is J'{v) ~ A* Ry^ {Av — fo) G X', 

{Av-b,A6v)Y' =0 ySvemSx). 

3.2 Ideal choice of the residual norm 

When approximating u in Sx using (18), the obtained approximation depends on the choice of 
the residual norm. If we want to find a best approximation of u with respect to the norm || • \\x, 
then the residual norm should be chosen such that 

\\A{v-u)\\y' = \\v-u\\x yveX, 
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or equivalently such that the following relation between inner products holds: 

{v,w)x ^ {Av,Aw)y' '^v,weX. (19) 

This implies 

{v,w)x = {Av,Ry^Aw)y\y = {v,R'x^A*Ry^Aw)x, 

for all v,w Cz X, and therefore, by identification, 

Ix = Rx^A*Ry^A ^Ry = AR-^A* ^ Rx = A*R-^A. (20) 

Also, since 

{v,w)y = {Ryv,w)y'.y = {ARx A*v,w)y'.y 
= {R-]^^A*v,A*w)x.x' = {A*v,A*w)x' 

for all VjW eY, we also have that (19) is equivalent to the following relation: 

{v,w)y ^ {A*v,A*w)x' yv,weY. (21) 

Note that (19) and (21) respectively impose 

\\v\\x = \\Av\\y' and \\w\\y = \\A*w\\x'. (22) 

This choice implies that the weak coercivity and continuity constants are such that a = /3 = 1, 
and therefore 

KiA) = 1, 

meaning that problem (9) is ideally conditioned. 

In practice, we will first define the inner product (•, ■)x and the other inner product (•, •)y will 
be deduced from (21). 

Example 3.1. Consider that X ~ Y and let A = B + C with B a symmetric coercive and 
continuous operator and C a skew- symmetric operator. We equip X with inner product {v,w)x — 
{Bv,w)x',x, which corresponds to Rx = B. Therefore, 

My = U^v^x, = \\Bv\\j,, + WCvWJ,, = ll^lll + WCvWJ,,. 



v\\y corresponds to the graph norm of the skew- symmetric part C of operator A. When C — Q, 

Y — IPIIx- 



we simply have ||f "^ 



Example 3.2 (Finite dimensional problem). Consider the case of finite dimensional tensor spaces 
X — Y = ]U"ix---x"d^ g g^ after a discretization step for the solution of a high- dimensional partial 
differential equation. The duality pairings are induced by the standard canonical inner product. We 
can choose for (•, ■)x the canonical inner product on i]j"ix---x"(i^ which corresponds to Rx = Ix, 
the identity on X . Then, inner product on Y is defined by relation (21), which implies 

{v,w)y — {A*v, A*w)x and Ry = AA* . 

3.3 Gradient-type algorithm 

For solving (18), we consider the following basic gradient-type algorithm: letting u" = 0, we 
construct a sequence {u''}k>o in Sx and a sequence {y''}k>o in Y defined for fc > by 



RyHAu'^-b) 

11 . (23) 

ensAu''-pRx'A*y') 
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with p > 0. Equations (23) yield 



u 



fc+i c rr„ (o, _L R (o,k 



eUs^iu + Bpiu^-u)), 



with Bp = Ix — pR^A*Ry^A a symmetric operator from X to X. For all v <^ X, 

{Bpv,v)x , ||Awii2 



Ml ' \\v\W 

From (10), we deduce that the eigenvalues oi Bp are in the interval [1 — p/3^, 1 — pa^]. The spectral 
radius of Bp is therefore bounded by 

7(p)==max{|l-p/32|,|l-pa2|}. 
Proposition 3.3. Assuming "f{p) < 1/2, the sequence {u''}k>i defined by (23) is such that 

M -u\\x< (27)1|w° -u\\x + Y^ll" - n^xMIU (24) 

and 

lim sup M - u\\x < -^\\u-IlsA^)\\x (25) 

Proof. Denoting v^ = u^ — u, we have 

||u'+' - u|U < lin^x (" + Sp«') - «llx 

< ||n5^(M + Bpw'^) -{u + Bpv'')\\x + WBpv'^Wx 
<\\w-{u + Bpv'')\\x + \\BpV^\\x 

for all w G 5j!f. In particular, this inequality is true for all w G Ils^{u), and therefore, taking the 
supremum over all w G ^Sx ("); ^^ obtain 

||w'+' -u\\x< \\Us, (u) - (u + Bpv'')\\x + WBpv'^Wx 
<\\nsxiu)-u\\x+2\\Bpv''\\x 
Since ||i3pw||x < 7||w||x for all w G X and since 27 < 1, we have 
\\u^+' - u\\x < WHsx {u) -u\\x + 27||w - u'^l 



X 

< i2jr+'\W' -u\\x+ ^ '^'/^' \\u UsAu)\\x 



1-27 
from which we deduce (24) and (25). D 



The condition j{p) < 1/2 imposes £ < a/3 and p G (5^7 sf^)' ^^^ condition £ < \/3 is a 
very restrictive condition which is in general not satisfied without an excellent preconditioning of 
operator A. 

However, with the ideal choice of norms introduced in the previous section (equation (22)), we 
have a = f3 = 1 and Bp = (1 — p)Ix- That means that the problem is ideally conditioned and we 
have convergence for all p G [| , |] towards a neighborhood of Hsx (") of ^i^^ i^J^ 11"" ^ ^Sx (") ll-f 
with 7 = |1 — p|. 

Corollary 3.4. Assume that (22) is satisfied. Then, if p E [5^ |]; the sequence {u'^}k>i defined 
by (23) verifies (24) and (25) with j{p) = |1 — p|. Moreover, if p = 1, then u^ G Ils^{u), that 
means that the algorithm converges in one iteration whatever the initialization vP . 

Note that the choice of the ideal norm is equivalent to the direct solution of the initial problem 
(since it involves the inverse of the mapping Ry = AR'^A*) so that the above ideal algorithm 
can not be used in practice. In the following section, we introduce a computable perturbation of 
this ideal algorithm. 
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4 Perturbation of the ideal approximation 

We now consider that function spaces X and Y are equipped with norms satisfying the ideal 
condition 

\\Av\\y' = \\v\\x V-y e X. (26) 

The solution of problem (18) using this ideal choice of norms is therefore equivalent to the best 
approximation problem (12), i.e. 

min \\Av — bWy = niin \\v — u\\x- (27) 

v£Sx v£Sx 

Unfortunately, the computation of the solution of (27) would require the solution of the initial 
problem. We here propose to introduce a computable perturbation of this ideal approach. 

4.1 Approximation of the ideal approach 

The idea is to replace problem (27) by the following problem: 

mm\\A\Ry\Av-b))\\Y, (28) 

v<£Sx 

where A : y — > 1" is a mapping that provides an approximation A (r) of the residual r = 
Ry^{Av — b) (zY with a controlled relative precision S > 0, i.e. 

||A^(r)-r||y<(5||r||y. 

A practical definition of the mapping A'' will be detailed in section 5. We will then assume that 
the mapping A is such that: 

\\^\y)-y\\Y<S\\y\\Y, yyeVY = {RY\Av-by,veSx}. (29) 

As we will see in the following algorithm, it is sufficient for A* to well approximate residuals that 
are in the subset Vy (and not in y \ Py) whose content depends on the chosen subset Sx and on 
the operator and right-hand side of the problem. 

4.2 Quasi-optimal approximations in Sx 

Here we consider the case where we are not able to solve the best approximation problem in Sx 
exactly, because there is no available algorithm for computing a global optimum, or because the 
algorithm has been stopped at a finite precision (see section 5.1 for practical comments). We 
introduce a set of quasi-optimal approximations 11^ (u) C Sx such that 

\\u-m,J^^)\\x<vh-^Sx{^)\\x (^>l). (30) 

Remark 4.1. Note that by introducing this new perturbation, we are able to remove the assumption 
that Sx is closed and to handle the case where the problem (27) does not have a solution, i.e. 
n^x (") = ^- ^^ ^^^^ case, we have to replace \\u — Ils^(u)\\x by ini^^s^ \\u — w\\x in equation 
(30). 

Remark 4.2. Note that if Sx denotes a low-rank subset of an infinite dimensional space X , 
additional approximations have to be introduced from a numerical point of view (see remark 2.1). 
These additional approximations could be also considered as a perturbation leading to quasi-optimal 
approximations, where rj takes into account the approximation errors. In numerical examples, we 
will not adopt this point of view and we will consider X as the approximation space and the 
approximate solution in X of the variational problem will serve as a reference solution. 
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4.3 Perturbed gradient-type algorithm 

For solving (28) , we now introduce an algorithm which can be seen as a perturbation of the ideal 
gradient-type algorithm (23) introduced in section 3.3, with p = 1. Letting u'^ = 0, we construct 
a sequence {u'^}k>o C Sx and a sequence {y''}k>a C Y defined for A: > by 



y'' ^ A\R7.\Au'' - b)) 



u'+'en^'{u^-R^'A*y'^) 



Proposition 4.3. Assume (26), (29), and (30), with 6(1 + 1]) < 1. Then, the sequence {?i'^}fe>i 
defined by (31) is such that 

Wu'' -u\\x< ((1 + 1)5)"^" u\\x + Y^J^YTv)^^"" "^" ^"^"'^ ^^^^ 

and 

lim sup \\u''-u\\x < jir—^\\u-nsx{u)\\x (33) 

fe^oo l-()(l + ?7) 

Proof. Equation (31) can also be written 

u'^+i elll^iu + B^iu'' -u)) 

with B^{v) = V — K^ A* K^ [Ry^ A{v)) . Denoting v^ = u^—u, and following the proof of proposition 
3.3, we obtain 

h'+' -u\\x< ||n^, {u + SV) -{u + i?V)||x + WBv^Wx 

< vmsju) - (u + bV)||x + ||sV||x 

< rj\\nsx (u) - m||x + (1 + »7)||sV||x 

Moreover, using (26) and (20), we have 

llsVlIx = lli^'^ - Rx'A*A'{Ry'Av'')\\x 

= \\Av^ - ARx'^A*A\Ry^Av'')\\y' 
= WRy^Av'' - A\R-^Av'')\\y. 

Noting that Ry^Av'^ = Ry^lAu'' — b) belongs to subset Vy, we deduce from assumption (29) and 
equation (26) that 

WB'v'^Wx < SWRy'Av'^WY = SWyf^Wx. 
Denoting 5^ = (5(1 + rj), we finally have 



X < vW^-Sx (") - u\\x + Sr,\\u'' - u\\ 



w''+^ - u\\x S m\>->-Sx W - nx + o^\\u'^ - u\\x 



< s';+'\\u" -u\\x + »? V^ll" - n^.NIU' 

from which we deduce (32) and (33). D 



Comments We note that the sequence converges towards a neighborhood of II^^ (u) whose size 
is ^^^li^lg \\u — Ilsx{u)\\x- Indeed, (32) imphes that 

\\u - Usx{u)\\x < ||« - wllx < (1 + 7fe)l|w - ^Sx{u)\\x, (34) 
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with limsupj._^g^ jk < ^^_(i+ )s ■ Therefore, the sequence tends to provide a good approximation 
of the best approximation of u in Sx, and the parameters 6 and rj controls the quahty of this 
approximation. Moreover, equation (32) indicates that the sequence converges quite rapidly to 
this neighborhood. Indeed, in the first iterations, when the error J|m — m'^||x is dominated by the 
first term ((1 + ri)d)''\\u — u''*\\x, the algorithm has at least a linear convergence with convergence 
rate (1 + ri)d (note that for ?7 ~ 1, the convergence rate is very high for small S). Once both error 
terms are balanced, the error stagnates at the value i^n'L -ig ll^ ^ n5^(u)||x- Note that when 
i5 — >■ 0, we recover an ideal algorithm with a convergence in only one iteration to an element of the 
set n^ (u) of quasi-best approximations of u in Sx ■ 

Remark 4.4. Even if Sx is chosen as a subset of low-rank tensors, the subset Vy defined in 
(29) possibly contains tensors with high ranks (or even tensors with full rank) that are not easy 
to approximate with a small precision S using low-rank tensor representations. However, the 
algorithm only requires to well approximate the sequence of residuals {Ry {Au — b)}k>o C 'Dy, 
which may be achievable in practical applications. 

4.4 Error indicator 

Along the iterations of algorithm (31), an estimation of the true error \\u — u'^Wx can be simply 
obtained by evaluating the norm ||y'^||r of the iterate y'^ = A^{r'^) with r'^ = Ry^i^Au'' — b). 
Indeed, from property (29), we have 

il-S)\\y\\y<\\A'iy)\\y<il + S)\\y\\y, (35) 

for all y G Vy. Therefore, noting that r'^ e Vy and ||?''^||y = \\A{u — u'')\\yi = \\u — u''\\x, we 
obtain 

{l-5)\\u-u''\\x<\\y''\\y<{l + S)\\u-u''\\x. (36) 

In other words, 

^' = Y^II/I|y (37) 

provides an error indicator of the true error jju — u'^||j!f with an effectivity index t'^ == iiu-u''ll — ^ 
(1, Yzf), which is very good for small S. 

Moreover, if A'' is an orthogonal projection onto some subspace Y^ C Y, we easily obtain the 
following improved lower and upper bounds: 

Vl-S^u -u>'\\x< h^Wy < \\u - u'^Wx, (38) 

that means that the following improved error estimator can be chosen: 

^^ = ^ll/llv, (39) 

5 Computational aspects 

5.1 Best approximation in tensor subsets 

We here discuss the available algorithms for computing an element in 115^(1;), that means for 
solving 

min \\v — w\\x, (40) 

weSx 

where u is a given tensor in the tensor space X = ||.||^ (S'u=i -^m equipped with norm \\ ■ \\x, and 
where Sx is a given tensor subset. Note that except for the case where d = 2 and || • ||x is the 
induced (canonical) norm, the computation of a global optimum is still an open problem. 



with effectivity index f'^ = j, — S^— € (1, —7= 
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Canonical norm, d = 2. For the case d — 2, we first note that all classical low-rank tensor 
formats coincide with the canonical format, that means Sx = Tlm{X) for some rank m. When 
the norm || • |jx is the canonical norm, then Um G H^^ (u) coincides with a rank-rn singular value 
decomposition (SVD) of u (which is possibly not unique in the case of multiple singular values). 
Moreover, (7{u; Sx)^ — W^Sx (^)llx i*' ^he sum of the squares of the m dominant singular values of 
u (see e.g. [14]). Efficient algorithms for computing the SVD can therefore be applied to compute 
an element in Tlsx (^) (^ best approximation). That means that the algorithm (31) can be applied 
with rj = 1. 

Canonical norm, d > 2. For d > 2 and when the norm || • ||x is the canonical norm, different 
algorithms based on optimization methods have been proposed for the different tensor formats 
(see e.g. [26, 22[ for recent reviews). Very efficient algorithms based on higher order SVD have 
also been proposed in [10[, [20[ and [35[, respectively for Tucker, Hierarchical Tucker and Tensor 
Train tensors. Note that these algorithms provide quasi-best approximations (but not necessarily 
a best approximation) satisfying (30) with rj only function of the dimension d (see [22[). These 
quasi-best approximations can be used as initializations of optimization algorithms in order to 
obtain better approximations (i.e. with smaller precision 77). 

General norms, d> 2. For a general norm || • \\x, the computation of a global optimum to the 
best approximation problem is still an open problem for all tensor subsets, and methods based on 
SVD can not be applied anymore. However, classical optimization methods can still be applied 
(such as Alternating Minimization Algorithm) in order to provide an approximation of the best 
approximation [37, 40, 13[. We do not detail further these computational aspects and we suppose 
that algorithms are available for providing an approximation of the best approximation in Sx such 
that (30) holds with a controlled precision rj, arbitrarily close to 1. 

5.2 Application of mapping A*^ 

At each iteration of the algorithm (31), we have to compute y'^ = A''(r'^), with r'' = Ry^lAu'^ — b) e 
Y, such that it satisfies 

||/-r'=||y<<5||r'=||y. (41) 

We here explain how this can be achieved. First note that r*^ is the unique solution of 

min||r-i?^^(^u'=-fe)||^. (42) 

Therefore, computing y'^ is equivalent to solving the best approximation problem (42) with a 
relative precision 5. Noting that 

\\r - Ry^Au'' - 6)11^ = ||r|||, - 2{Au'' - h,r)Y'.Y + \\Ry\Au'' - b)\\l, 

we have that r^ (zY is equivalently characterized by the variational equation 

{r^,5r)Y = {Au^ - b, Sr)Y>^Y V(5r G Y, 

or in an operator form: 

RYr'' = Au'' - b, (43) 

where the Riesz map Ry = AR^ A* is a positive symmetric definite operator. 



Tensor approximation based on ideal minimal residual formulations 13 



5.2.1 Low-rank tensor methods 

For solving (42), we can also use low-rank tensor approximation methods. Note that in general, 
II • ||y is not an induced (canonical) norm in Y, so that classical tensor algorithms (e.g. based on 
SVD) can not be applied for solving (42) (even approximatively) . Different strategies have been 
proposed in the literature for constructing tensor approximations of the solution of optimization 
problems. We could introduce a tensor subset Sy C Y and compute an approximation of r^ in Sy 
using optimization algorithms applied to problem (42), or iterative solvers using classical tensor 
approximations applied to equation (43) [27, 24, 30, 3]. However, we need an adaptive strategy in 
order to satisfy the required precision S. 

A first strategy would consist in constructing a succession of approximations {y'^}m>i in an 
increasing sequence of tensor subsets {5™}m>i until the precision 6 is reached. Here, we rather 
adopt another (more adaptive) strategy which consists in keeping the subset Sy fixed (possibly 
small) and in applying greedy algorithms for approximating r*^ (see e.g. [16]). 

5.2.2 A possible algorithm 

We use the following algorithm for the construction of a sequence of approximations {ym}m>o- 

Let yg = and let Zq be an initial linear subspace. Then, for each ?Ti > 1, we proceed as 
follows: 

• compute an optimal correction ui^^ of y'^_i in Sy: 

wt e arg min \\yt-i +w- r^\\y, 

• define a linear subspace Z^ such that Z^-^_-^ C Z^ and w^ G Z^, 

• compute y^ as the best approximation of r^ in Z^: 

yt = arg min \\y-r''\\y. 

The convergence proof for this algorithm can be found in [16[. The convergence ensures that the 
precision S can be achieved after a certain number of iterations.^ 

As a stopping criterion, we use a heuristic error estimator based on stagnation. The algorithm 
is stopped at iteration m if 

p _ \\ym -Vm+pWY , , 

m ~ II fc II — ' ^ ' 

WVni+pWY 

for some chosen p > 1 (typically p — 1). Note that for p sufficiently large, y^+p can be considered 
as a good estimation of the residual r'^ and the criterion writes \\r'^ — y^lW < '^Ik'^lli'i which is 
the desired property. This stopping criterion is quite rudimentary and should be improved for a 
better control of the algorithm. 

5.2.3 Remark on the tensor structure of Riesz maps 

We consider that operator A and right-hand side b admit low-rank representations 






•^Note however that a slow convergence of these algorithms may yield to high rank representations of iterates 
y^, even for a low-rank subset Sy- 
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We suppose that a norm || • \\x has been selected and corresponds to a Riesz map Rx with a 
low-rank representation: 

rx 
i=l 

The ideal choice of norm || • ||y then corresponds to the following expression of the Riesz map Ry- 

TA rx rA 

Ry^AR-/A* = (E®^=i^r)(E®^=ii?r)-^(E®;:.iAf*). 

i—l i—1 i—1 

Note that the expression of Ry can not be computed explicitly (Ry is generally a full rank tensor) . 
Therefore, in the general case, algorithms for solving problem (43) have to be able to handle an 
implicit formula for Ry. However, in the particular case where the norm || • ||x is a canonical norm 
induced by norms || • \\^ on X^, the mapping Rx is a rank one tensor Rx = (d't^iRx , where Rx 
is the Riesz map associated with the norm || • ||^ on X^. Ry then admits the following explicit 
expression: 

rA rA 

Ry ^ AR-^^A* == EE<=i(^"^3.:^r)- 
i=i j=i 

In the numerical examples, we only consider this simple particular case. Efficient numerical meth- 
ods for the general case will be proposed in a subsequent paper. 

5.3 Summary of the algorithm 

Algorithm 1 provides a step-by-step outline of the overall iterative method for the approximation 
of the solution of (27) in a fixed subset Sx and with a chosen metric || • \\x- Given a precision 
S, an approximation of the residual is obtained with a greedy algorithm using a fixed subset Sy 
for computing successive corrections. We denote by e(j/^,r'^) an estimation of the relative error 
hi - r''\\Y/\\r''\\y, where r'= = Ry^iAu'' - b). 

Algorithm 1 Gradient-type algorithm 



Set mO = and Z^ = ; 
for k = Q to K do 

Compute the projection j/q = arg min \\y — r ||y ; 

Set m — Q ] 

while e{yi,r'') < 5 do 

ra = m + 1 ; 

Compute a correction w^ G arg min ||j/„_]^ -\- w — r \\y 

w^Sy 

Set Z^ == Z,';_i + spanju;^} ; 

Compute the projection y^ = arg min ||y — r ||y 

end while 



yez^ 



Compute u'^+i e Hi; [u^ - R^^A*y^^) ; 



'^Sx 

end for 



6 Greedy algorithm 

In this section, we introduce and analyze a greedy algorithm for the progressive construction of a 
sequence {■Um}m>Oi where Um is obtained by computing a correction of Um-i in a given low-rank 
tensor subset Sx (typically a small subset such as the set of rank-one tensors TZi{X)). We here 
consider that approximations of optimal corrections are obtained with the approach presented in 
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section 4. It results in an algorithm which can be considered as a modified version of weak greedy 
algorithms [39]. 

Here, we assume that the subset Sx verifies properties (13) and (14), and that span(5x) is 
dense in X (which is verified by all classical tensor subsets presented in section 2.2). 

6.1 A weak greedy algorithm 

We consider the following greedy algorithm. Given uq = 0, we construct a sequence {um}m>i 
defined for m > 1 by 

Um^Um-l+Wm, (45) 

where w^ G Sx is a correction of Um-i satisfying 

\\u- Um-i -Wm\\x < (1 +7m) min \\u - Um-1 -w\\x, (46) 

wESx 

with "fm a sequence of small parameters. 

Remark 6.1. A Wm satisfying (46) can be obtained using the gradient type algorithm of section 
4 that provides a sequence that satisfies (34). Given the parameter S = 5^ in (31), property (46) 
can be achieved with any 7„i > i_^ ■ 

6.2 Convergence analysis 

Here, we provide a convergence result for the above greedy algorithm whose proof follows the lines 
of [39] for the convergence proof of weak greedy algorithms'^. 

In the following, we denote by f„i — u—Um- For the sake of simplicity, we denote by || • || — \\-\\x 
and (•, •) = (•, ■)x and we let w„ e ^Sxifm-i), for which we have the following useful relations 
coming from properties of best approximation problems in tensor subsets (see section 2.2): 

\\frn-l-Wm\\^^\\fm-l\\^-\\Wm\\^ and || Wm |P == (/m-1 , Wm)- (47) 

We introduce the sequence {am}m>i defined by 

\\fm-l-W,n\\ , . 

a-m = TT? n • (48) 

ll./m-l| 

Lemma 6.2. Assuming that for all m > 1 we have 

(l+7m)am<l, (49) 

the sequence {||/m||}m>i converges. Furthermore, it is possible to define a positive sequence 
{Km}rn>i defined by 

Jl r,\jm—l:W„i) 

\\Wm\\' 

and we have {Km||{ym||}m>i G £^. 



^rn-^ '.~,2 -^^ (50) 



Proof. From (45) and (46), we have 

ILfmll == ll/m-1 -WmW < (1 +7m)ll/m-l - Wm\\ = (1 + Tm)"™ || /m-1 1| • 

Under assumption (49), {||/m||}m>i is a decreasing and positive sequence and therefore converges. 
Moreover, since 

Wfm-l - WrnW^ = \\fm-l\\^ - (2(/„_i , W„i) - ||w„i|p) < ||/m-l||^, 



^Note that the condition (46) on the successive corrections does not allow to directly apply the results on classical 
weak greedy algorithms. 
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it follows that 2{f„i^i,Wm) > H'S'mlP- Therefore, «„ can be defined by (50) and we have 

m 
II -(■ ~ I|2 II J- ||2 2 II ~ ||2 II /• ||2 V^ 2||~ ||2 

\\.frn-l-W,n\\ =||./m-l|| -tmll^mll =ll/o|| ^ 2^ 'i^\\w^\\ , 

i=l 

that ends the proof. D 

We now provide a result giving a relation between ||wm|| and Hwmll- 



,2 _ 1 - {l + jmf al 
' ^ 1 - a2 



Lemma 6.3. Assume (49) holds and let ji^ = — G [0, 11. Then, we have 

1 — a 

m 
A^mllWrnll < Hm\\Wm\\ < ||M^m||, (51) 

and 

^<Km. (52) 

Proof. From inequality (46) and from the optimality of Wm , it comes 

Wfm-l -WmW^ < Wfrn-l ^ WmW^ < (1 +7m)^ll/m-l -l^mlP 
^ ||.fm-l||' - \\w,nf < \\.fm-lf - ^^U^mf < (1 + 7m)'a^ || /m-1 1| ' 
^ (1 - (l+7m)'a™)||/m-l||' < «L||W„||2 < \\w,J^ 

Using |l/„_i|p == \\.fni-i-w^\\^ + \\w„,\\^ ==amll/m-i|P + ||w™|p, and using the definition of /x„, 
we obtain (51). In addition, from the optimality of Wm, we have {-u^f^, fm-i) < ( i|^'"i| , ./m-i) = 

lluimll, or equivalently ^^^ — ll^mll !i ll'^^mll- Combined with (51), it gives ^^^ — < l-jpj < ^^^, 
which implies (52). D 

Lemma 6.4. Assume (49) and that {Mm}„i>i ^■^ such that X]m=i Mm = °^- Then, if {fm}m>i 
converges, then it converges to zero. 

Proof. Let us use a proof by contradiction. Assume that /,„ — > / 7^ as jti — > 00, with / G X. As 
Sx is dense in X, then there exists e > such that sup^g^^ |(/, ct)| > 2£. Using the definition 
of Wm and of / as a limit of /„, we have that there exists iV > such that 

V 

||w;„|| == sup |(/„_i, -j—j-)! > e, \/m>N. (53) 

Thanks to (51), we have 

II -f I|2 II J- ||2 II ~ ||2 2 / II -f l|2 II ||2 2 

||,/m|| = ||./m-l|| -\\Wrn\\ Km<\\jrn-l\\ -||w™||^„, 

m m 

<\\fNr- E i^i\M'<\\fNr-e' Y. ^^l 

i=N+l i=N+l 

which implies that {^im}m>o G ^^, a contradiction with the assumption. 

D 

Proposition 6.5. Assume (49). Further assume that the sequence fim is non increasing and 

verifies 

00 2 

E — =«^- (54) 



m 

m—1 



Then the sequence {um}m>i converges to u. 
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Proof. Let two integers n < m and consider 

Win Jm\\ 1 1 /nil ll/r?i|| ^\Jn Jm^ Jni) • 

Noting 6n,m = \{fn ^ fm, frn)\ and using Lemma 6.3, we obtain 



m m II — W ''^ 

i=n+l i=l A^m+1 j^i 



W, 



Lemma 6.2 implies that Km||wm|l S ^^- Together with assumption (54), and using Lemma 2.8 
in [38], we obtain that liminfm_j.oo inax„<m 0„_m = 0. Lemma 2.8 in [38] then proves that the 
sequence {fm}m>i converges. Noting that (54) imphes that {lJ'm}m=i ^ ^^i Lemma 6.4 allows to 
conclude the proof. D 

7 Numerical example 

In this section, we apply the proposed method to the numerical solution of a stochastic reaction- 
advection-diffusion problem. 

7.1 Stochastic reaction-advection-diffusion problem 

We consider the following steady reaction-advection-diffusion problem on a two dimensional unit 
square domain il = [0, 1]^ (see Figure 1): 

— Au + c • V?i + au = f in Q, (55) 

M = on 917. 

The advection coefficient c and the reaction coefficient a are considered as random and are given by 
c = CiCo and a ~ cxp(^2), where ^i ^ f7(— 350,350) and ^2 ~ C^(log(0.1),log(10)) are independent 
uniform random variables, and co(a;) = {x2 — 1/2,1/2 — xi), x = (2:1,2:2) € fi. We denote 
by Si ==]-350, 350[ and S2 =] log(O.l), log(10)[, and we denote by (S,Z?(S),P^) the probability 
space induced by ^ = (^11^2), with S = Si x S2 and P^ the probability law of ^. The external 
source term / is given by f{x) = ^01(2:) ^ ^^2(2^)1 where fii =]0.45, 0.55[x]0.15, 0.25[ and il2 — 
]0.45,0.55[x]0.75, 0.85[, and where Iq^, denotes the indicator function of O^. 
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Figure 1: Example : reaction-advection-diffusion problem. 

Let V — HQ{il) and 5* — i^(S, dP^). We introduce approximation spaces Vn C V and Sp C S, 
with N — dim(V/v) and P = dim(S'p). Vn is a Qi finite element space associated with a uniform 
mesh of 1600 elements such that N — 1521. We choose Sp — S'Ij (E) S*!^, where S'Ij is the space 
of piecewise polynomials of degree 5 on Si associated with the partition {]-350, 0[,iO, 350[} of Si, 
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and Sj,^ is the space of polynomials of degree 5 on S2. This choice results in P = 72. The Galerkin 
approximation u G Vn (E) Sp C V (E) S oi the solution of (55) is defined by the following equation^: 

/ / {Vu-Vv + c-Vuv + auv)dx dP^= / / f v dx dP^, (56) 

for all V e Vat (g) S'p. Letting Vn (^ Sp = span{(pi (^ ^pj;l < i < N,l < j < P}, the Galerkin 
approximation u = ^j^-^^--^ Uijipi®^j can be identified with its set of coefficients on the chosen 
basis, still denoted u, which is a tensor 

w e X = M^ «) M^ such that Au = b, (57) 

where b = 6^ (g) 6^, with bf = J^ fifi and 6^ = /_. ipjdP^, and where ^ is a rank-3 operator such 
that A = D^ (^ M« + C^ (» if «i + i?^ (» ii«2 , with D^^ = J^ Vipi ■ Vifkdx, Cf,^ = J^ (piCo ■ Vcpkdx, 
^ik = Ji^'Pt'Pkdx, M| = J^i:j{y)iji{y)dP^{y), H^^ = J^yn^j{y)ilji{y)dP^{y), n = 1,2. Here, we 
use orthonormal basis functions {"0^} in Sp, so that M^ = Ip, the identity matrix in 



fP 



7.2 Comparison of minimal residual methods 

In this section, we present numerical results concerning the approximate ideal minimal residual 
method (A-IMR) applied to the algebraic system of equations (57) in tensor format. This method 
provides an approximation of the best approximation of u with respect to a norm || • \\x that can 
be freely chosen a priori. Here, we consider the application of the method for two different norms. 
We first consider the natural canonical norm on X, denoted || • II2 and defined by 

N P 

Ml2-EE(^'^-)'- (58) 

This choice corresponds to an operator Rx — Ix = In '^ Ipi where /jv (resp. Ip) is the identity 
in R^ (resp. IR-'^). We also consider a weighted canonical norm, denoted || • W^ and defined by 

N P 

MI^-EE(^(^^)^»^-)', (59) 

i=l J=l 

where w : f7 — > M is a weight function and the Xi are the nodes associated with finite element shape 
functions ipi. This norm allows to give a more important weight to a particular region D C fi, 
that may be relevant if one is interested in the prediction of a quantity of interest that requires a 
good precision of the numerical solution in this particular region (see section 7.2.3). This choice 
corresponds to an operator Rx = D^ (E) Ip, with D^ = diag(w(a::i)^, . . . , w{xn)'^). 

The A-IMR provides an approximation u G Sx of the || • ||x-best approximation of u in Sx 
(that means an approximation of an element in H^^ (w)), where || • |jx is either || • II2 or || • ||^. The 
set Sx is taken as the set Tlr{X) of rank-r tensors in X = M^ (g)K^. The dimension of X is about 
75,000 so that the exact solution u of (57) can be computed and used as a reference solution. We 
note that both norms are induced norms in K^ (g) M^ (associated with rank one operators Rx) so 
that the || • |]x-best approximation of u in Sx is a rank-r SVD that can be computed exactly using 
classical algorithms (see section 5.1).^ For the construction of an approximation in TZr{X) using 
A-IMR, we consider two strategies: the direct approximation in 7?,r(X) using Algorithm 1 with 
Sx = Tlr{X), and a greedy algorithm that consists in a series of r corrections in Tli{X) computed 
using Algorithm 1 with Sx = Ii-i{X) and with an updated residual b at each correction. 



^The mesh Peclet number is sufficiently small so that an accurate Galerkin approximation can be obtained 
without introducing a stabilized formulation. 

^Note that different truncated SVD are obtained when M.^ is equipped with different norms. 
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The A-IMR will be compared to a standard approach, denoted CMR, which consists in mini- 
mizing the canonical norm of the residual of equation (57) , that means in solving 



min \\Av ■ 

v€Sx 



b\[. 



(60) 



This latter approach has been introduced and analyzed in different papers, using either direct 
minimization or greedy rank-one algorithms [4, 11, 2], and is known to suffer from ill-conditioning 
of operator A. We note that this approach corresponds to choosing Rx =A*A and Ry — Ix — 
In (8) Ip- 



7.2.1 Natural canonical norm | • II2 

First, we compare both greedy and direct algorithms for || • \\x = |1 • II2, using either CMR or 
A-IMR with different precisions S. The convergence curves with respect to the rank are plotted 
on Figure 2, where the error is measured in the || • II2 norm. Concerning the direct approach, 
we observe that the different algorithms have roughly the same rate of convergence. The A-IMR 
convergence curves are close to the optimal SVD for a wide range of values of d. One should 
note that A-IMR seems to provide good approximations either for the value 6 — 0.9 which is 
greater than the theoretical bound 0.5 ensuring the convergence of the gradient-type algorithm. 
Concerning the greedy approach, we observe a significant difference between A-IMR and CMR. We 
note that A-IMR is close to the optimal SVD up to a certain rank (depending on 6) after which the 
convergence rate decreases but remains better than the one of CMR. Finally, one should note that 
using a precision S = 0.9 for A-IMR yields to less accurate approximations than CMR. However, 
A-IMR provides better results than CMR once the precision 6 is lower than 0.5. 
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Figure 2: Comparison of minimal residual methods for 5a- = TZr{X) and \\-\\x = ||'||2- Convergence 
with the rank r of the approximations obtained with CMR or A-IMR with different precisions S, 
and with direct (left) or greedy rank-one (right) approaches. 



7.2.2 Weighted norm | ||„, 

Here, we perform the same numerical experiments as previously using the weighted norm || • ||x = 
II • llu,, with w equal to 10^ on I? == [0.15, 0.25] x [0.45, 0.55] and w — 1 onn\D. The convergence 
curves with respect to the rank are plotted on Figure 3. The conclusions are similar to the case 
II'I|x = ||'||2j although the use of the weighted norm seems to slightly deteriorate the convergence 
properties of A-IMR. However, the direct A-IMR still provides better approximations than the 
direct CMR, closer to the reference SVD for different values of precision 5. 



7.2.3 Interest of using a goal-oriented weighted norm 

Here, we illustrate the interest of using the weighted norm rather than the natural canonical norm 
when one is interested in computing a quantity of interest. For the sake of readability, we let m^, 
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Figure 3: Comparison of minimal residual methods for Sx — TZr{X) and || • \\x = \\ ■ \\w Conver- 
gence with the rank of the approximations obtained with CMR or A-IMR with different precisions 
S, and with direct (left) or greedy rank-one (right) approaches. 



(resp. U2) denote the best approximation of u in TZj.{X) with respect to the norm || • W^^ (resp. 
II • II2). Figure 4 illustrates the convergence with r of these approximations. We observe that 
approximations u^ and U2 are of the same quaUty when the error is measured with the norm 1 1 • 1 1 2 , 
while Uuj is a far better approximation than U2 (almost two orders of magnitude) when the error 
is measured with the norm || • ||„. We observe that Uw converges faster to u with || • \\^ than M2 
with II • II2. For example, with a rank r = 9, Uw has a || • ||u,-error of 10^ while U2 has a || • ||2-error 
of 10^. On Figure 5, plotted are the spatial modes of the rank-r approximations U2 and u^. These 
spatial modes are significantly different and obviously capture different features of the solution. 
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Figure 4: Convergence of best rank-r approximations U2 and u^ 
the natural canonical norm || • II2 or the weighted norm || • ||u,. 



of the solution u measured with 



Now, we introduce a quantity of interest Q which is the spatial average of u on subdomain D: 



Qiu) 



1 

W\ 



u dx. 



D 



Due to the choice of norm, u^ is supposed to be more accurate than U2 in the subdomain D, and 
therefore, Q{um) is supposed to provide a better estimation of Q{u) than Q{u). This is confirmed 
by Figure 6, where we have plotted the convergence with the rank of the statistical mean and 
variance of Q{uw) and Q{u2). With only a rank r = 5, u^ gives a precision of 10~^ on the 
mean, whereas U2 gives only a precision of 10~^. In conclusion, we observe that a very low-rank 
approximation S^, is able to provide a very good approximation of the quantity of interest. 
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Figure 5: Comparison of the first spatial modes of the rank-r approximations u and u^ 
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(a) Error on the mean value of Q{u) 



(b) Error on the variance of Q(u) 



Figure 6: Convergence with the rank of the mean (left) and variance (right) oi Q{u2) and Q{uw)- 
Relative error with respect to the mean and variance of the reference solution Q{u). 

7.3 Properties of the algorithms 

Now, we detail some numerical aspects of the proposed methodology. We first focus on the 
gradient-type algorithm, and then on evaluations of the map A for the approximation of residuals. 



7.3.1 Analysis of the gradient-type algorithm 

The behavior of the gradient-type algorithm for different choices of norms || • ||x is very similar, 
so we only illustrate the case where || • ||x = || • lb- The convergence of this algorithm is plotted 
in Figure 7 for the case Sx — 7?.io(-^)- H is in very good agreement with theoretical expectations 
(Proposition 4.3): we first observe a linear convergence with a convergence rate that depends 
on (5, and then a stagnation within a neighborhood of the solution with an error depending on 
6. The gradient-type algorithm is then applied for subsets Sx = 7?.,(X) with different ranks 
r. The estimate of the linear convergence rate p is given in Table 1. We observe that for all 
values of r, p takes values closer to 5 than to the theoretical bound 26 of Proposition 4.3. This 
means that the theoretical bound of the convergence rate overestimates the effective one, and the 
algorithm converges faster than expected. Now, in order to evaluate the quality of the resulting 
approximation, we compute the error after the stagnation phase has been reached. More precisely, 
we compute the value 

||u'=-m||x 



7fc 



1, 



\x 



for k 



100. Values of 7100 are summarized in Table 2 and are compared to the theoretical upper 



bound 7 ~ 26/(1 — 26) given by Proposition 4.3. Once again, one can observe that the effective 
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Figure 7: Convergence of the gradient-type algorithm for different values of the relative precision 
6,iorSx^nio{X)&nd\\-\\x = \\-\\2- 
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Table 1: Estimation of the convergence rate p of the gradient-type algorithm (during the linear 
convergence phase) for different subsets Sx = Ti-riX), and for \\ ■ \\x = \\ ■ \\2- 

error of the resulting approximation is lower than the predicted value regardless the choice of 

UriX). 
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Table 2: Final approximation errors (estimated by 7100) for different subsets Sx — 'R-r{X) and 
different precisions 5. Comparison with the theoretical upper bound 2(5/(1 — 25). 

Now, we focus on numerical estimations of the error ||?i — w'^'Hx- It has been pointed out in 
Section 4.4 that e*^, defined in Eq. (39), should provide a good error estimator with effectivity 
index t^ G (1,(1 — S"^)^^/"^). For 5 = 0.2 and Sx = TZio{X), numerical values taken by f'^ 
during the gradient-type algorithm are plotted on Figure 8 and are compared to the expected 
theoretical values of its lower and upper bounds 1 and (1 — S^)~^^^ respectively. We observe 
that the theoretically upper bound is strictly satisfied, while the lower bound is almost but not 
exactly satisfied. This violation of the theoretical lower bound is explained by the fact that the 
precision S is not satisfied at each iteration of the gradient-type algorithm due to the use of a 
heuristic convergence criterium in the computation of residuals (see next section 7.3.2). However, 
although it does not provide a strictly controlled error estimation, the error indicator based on 
the computed residuals is of very good quality. 

7.3.2 Application of A* for the approximation of residuals 

Here, we study the behavior of the greedy algorithm described in Section 5.2.2 for the computation 
of an approximation y^^ = A^{r'') of the residual r*"' during the gradient-type algorithm. We first 
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Figure 8: Effectivity index t^ of the error estimator e'^ at different iterations k of the gradient-type 
algorithm, with Sx = Ti-ioiX) and S — 0.2. 

validate the ability of the heuristic stopping criterium (44) to ensure a prescribed relative precision 
S. Let M — M{S) denote the iteration for which the condition e^ < S is satisfied. The exact 

||r*''|JY is computed using a reference computation of r'^, and we 
e'^j/cM- Figure 9 shows the convergence of this effectivity index 

,„. We 



relative error ejv/ = Whm ~ ^'^\\y/ 
define the effectivity index A^^ = 
with respect to p, when using the natural canonical norm 



or the weighted norm 



observe that A^j tends to 1 as p — >■ oo, as it was expected since the sequence {ym}m>i converges to 
r^. However, we clearly observe that the quality of the error indicator differs for the two different 
norms. When using the weighted norm, it appears that a large value of p (say p > 20) is necessary 
to ensure A^^ G [0.9, 1], while p < 10 seems sufficiently large when using the natural canonical 
norm. That simply reflects a slower convergence of the greedy algorithm when using the weighted 
norm. 
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Figure 9: Evolution with p of the effectivity index A^^ for different S at step A: = 1 of the gradient- 
type algorithm with Sx = 7?-io(^) and for the natural canonical norm (left) or the weighted norm 
(right). 



Remark 7.1. One can prove that at step k of the gradient-type algorithm, when computing an 
approximation y\j of r^ with a greedy algorithm stopped using the heuristic stopping criterium 
(44), the effectivity index f^ of the computed error estimator e^ is such that 
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l-<52 ' 
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where X\,^ 



is the effectivity index of error indicator e\j (supposed such that 5/X^j^j < 1). That 
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provides an explanation for the observations made on Figure 8. 

Now, we observe in Table 3 the number of iterations of the greedy algorithm for the approxi- 
mation of the residual r^ with a relative precision 5, with a fixed value p — 2{) for the evaluation 
of the stopping criterium. The number of iterations corresponds to the rank of the resulting ap- 
proximation. We note that the required rank is higher when using the weighted norm. It reflects 
the fact that it is more difficult to reach precision 5 when using the weighted norm rather than 
the natural canonical norm. 
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Table 3: Computation of A''(r'^) for different precisions 5 and at different steps k of the gradient- 
type algorithm, with Sx = ^10 (direct approach). The table indicates the number of greedy 
corrections computed for reaching the precision 5, controlled using the heuristic stopping criterium 
(44) with p = 20. 



8 Conclusion 

In this paper, we have proposed a new algorithm for the construction of low-rank approximations 
of the solution of high-dimensional weakly coercive problems formulated in a tensor space X. This 
algorithm is based on the approximate minimization (with a certain precision S) of a particular 
residual norm on given low-rank tensor subsets Sx, the residual norm coinciding with some mea- 
sure of the error in solution. Therefore, the algorithm is able to provide a quasi-best low-rank 
approximation with respect to a norm || • ||x that can be designed for goal-oriented computations. 
A weak greedy algorithm using this minimal residual approach has been introduced and its con- 
vergence has been proved under some conditions. A numerical example dealing with the solution 
of a stochastic partial differential equation has illustrated the effectivity of the method and the 
properties of the proposed algorithms. Some technical points have to be addressed in order to 
apply the method to a more general setting and to improve its efficiency: the development of effi- 
cient solution methods for the computation of residuals when using general norms |] • |]x (that are 
not induced norms in the tensor space X), the introduction of robust error estimators during the 
computation of residuals (for the robust control of the precision 5), the application of the method 
for using tensor formats adapted to high-dimensional problems (such as Hierarchical formats). 
Also, a challenging perspective consists in coupling low-rank approximation techniques with adap- 
tive approximations in infinite-dimensional tensor spaces in order to provide approximations of 
high-dimensional equations (PDEs or stochastic PDEs) with a complete control on the precision 
of quantities of interest. 
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